Faster Algorithms for String Matching Problems: Matching the Convolution Bound

نویسنده

  • Piotr Indyk
چکیده

In this paper we give a randomized O(n logn)-time algorithm for the string matching with don't cares problem. This improves the Fischer-Paterson bound 10] from 1974 and answers the open problem posed (among others) by Weiner 30] and Galil 11]. Using the same technique, we give an O(n logn)-time algorithm for other problems, including subset matching and tree pattern matching 15, 21, 9, 7, 17] and (general) approximate threshold matching 28, 17]. As this bound essentially matches the complexity of computing of the Fast Fourier Transform which is the only known technique for solving problems of this type, it is likely that the algorithms are in fact optimal. Additionally, the technique used for the threshold matching problem can be applied to the on-line version of this problem, in which we are allowed to preprocess the text and require to process the pattern in time sublinear in the text length. This result involves an interesting variant of the Karp-Rabin ngerprint method 22] in which hash functions are locality-sensitive 25], i.e. the probability of collision of two words depends on the distance between them.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Wu Manber String Matching Algorithm and its Variations

String matching algorithms is become one of the most important topic in the computer science world. These algorithms are used in many real world problems like as scanning the threat in intrusion detection system, finding the pattern in text mining, match the similarity of the document in the plagiarism detection system, recognition in bio informatics and so on. String Matching Algorithms are br...

متن کامل

Modulated string searching

In his 1987 paper entitled Generalized String Matching Abrahamson introduced the concept of pattern matching with character classes and provided the first efficient algorithm to solve this problem. The best known solution to date is due to Linhart and Shamir (2009). Another broad yet comparatively less intensively studied class of string matching problems is numerical string searching, such as ...

متن کامل

Faster Filters for Approximate String Matching

We introduce a new filtering method for approximate string matching called the suffix filter. It has some similarity with well-known filtration algorithms, which we call factor filters, and which are among the best practical algorithms for approximate string matching using a text index. Suffix filters are stronger, i.e., produce fewer false matches than factor filters. We demonstrate experiment...

متن کامل

Towards Faster String Matching

Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Hannu Peltola Name of the doctoral dissertation Towards Faster String Matching Publisher Aalto University School of Science Unit Department of Computer Science Series Aalto University publication series DOCTORAL DISSERTATIONS 78/2013 Field of research Software Technology Manuscript submitted 11 December 2012 Date of the defenc...

متن کامل

Cell-probe bounds for online edit distance and other pattern matching problems

We give cell-probe bounds for the computation of edit distance, Hamming distance, convolution and longest common subsequence in a stream. In this model, a fixed string of n symbols is given and one δ-bit symbol arrives at a time in a stream. After each symbol arrives, the distance between the fixed string and a suffix of most recent symbols of the stream is reported. The cell-probe model is per...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998